Search CORE

2,207 research outputs found

Reinforced Axial Refinement Network for Monocular 3D Object Detection

Author: A Geiger
A Saxena
B Pepik
HA Alhaija
M Bertozzi
ML Littman
RS Sutton
S Levine
V Mnih
Publication venue
Publication date: 31/08/2020
Field of study

Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image. This is an ill-posed problem with a major difficulty lying in the information loss by depth-agnostic cameras. Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space. To improve the efficiency of sampling, we propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step. This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it. The proposed framework, Reinforced Axial Refinement Network (RAR-Net), serves as a post-processing stage which can be freely integrated into existing monocular 3D detection methods, and improve the performance on the KITTI dataset with small extra computational costs.Comment: Accepted by ECCV 202

arXiv.org e-Print Archive

Crossref

Active MR k-space Sampling with Reinforcement Learning

Author: B Gözcü
B Zhu
F Chen
GP Zientara
GP Zientara
J Schlemper
K Lønning
LP Kaelbling
LP Panych
M Lustig
M Seeger
ML Puterman
P Zhang
RS Sutton
RS Sutton
S Wang
S-S Yoo
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/10/2020
Field of study

Deep learning approaches have recently shown great promise in accelerating magnetic resonance image (MRI) acquisition. The majority of existing work have focused on designing better reconstruction models given a pre-determined acquisition trajectory, ignoring the question of trajectory optimization. In this paper, we focus on learning acquisition trajectories given a fixed image reconstruction model. We formulate the problem as a sequential decision process and propose the use of reinforcement learning to solve it. Experiments on a large scale public MRI dataset of knees show that our proposed models significantly outperform the state-of-the-art in active MRI acquisition, over a large range of acceleration factors.Comment: Presented at the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 202

arXiv.org e-Print Archive

Crossref

Dynamic programming with approximation function for nurse scheduling

Author: B Cheang
B Maenhout
DP Bertsekas
H-J Schuetz
J Bergh Van den
M Dorigo
M Elshafei
ML Puterman
P Causmaecker De
R Bellman
RS Sutton
WB Powell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/12/2016
Field of study

Although dynamic programming could ideally solve any combinatorial optimization problem, the curse of dimensionality of the search space seriously limits its application to large optimization problems. For example, only few papers in the literature have reported the application of dynamic programming to workforce scheduling problems. This paper investigates approximate dynamic programming to tackle nurse scheduling problems of size that dynamic programming cannot tackle in practice. Nurse scheduling is one of the problems within workforce scheduling that has been tackled with a considerable number of algorithms particularly meta-heuristics. Experimental results indicate that approximate dynamic programming is a suitable method to solve this problem effectively

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham

Probabilistic inference for determining options in reinforcement learning

Author: Christian Daniel
Christopher M Bishop
CJCH Watkins
E Theodorou
Gerhard Neumann
Herke van Hoof
J Morimoto
Jan Peters
LE Baum
M Lagoudakis
ML Puterman
RS Sutton
TG Dietterich
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Tasks that require many sequential decisions or complex solutions are hard to solve using conventional reinforcement learning algorithms. Based on the semi Markov decision process setting (SMDP) and the option framework, we propose a model which aims to alleviate these concerns. Instead of learning a single monolithic policy, the agent learns a set of simpler sub-policies as well as the initiation and termination probabilities for each of those sub-policies. While existing option learning algorithms frequently require manual specification of components such as the sub-policies, we present an algorithm which infers all relevant components of the option framework from data. Furthermore, the proposed approach is based on parametric option representations and works well in combination with current policy search methods, which are particularly well suited for continuous real-world tasks. We present results on SMDPs with discrete as well as continuous state-action spaces. The results show that the presented algorithm can combine simple sub-policies to solve complex tasks and can improve learning performance on simpler tasks

University of Lincoln Institutional Repository

TUbiblio

Crossref

MPG.PuRe

Preference-Based Monte Carlo Tree Search

Author: A Rimmel
CB Browne
CS Lee
D Silver
J Fürnkranz
JD Knowles
L Kocsis
LL Thurstone
ML Puterman
P Auer
R Busa-Fekete
RS Sutton
T Pepels
Y Yue
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/07/2018
Field of study

Monte Carlo tree search (MCTS) is a popular choice for solving sequential anytime problems. However, it depends on a numeric feedback signal, which can be difficult to define. Real-time MCTS is a variant which may only rarely encounter states with an explicit, extrinsic reward. To deal with such cases, the experimenter has to supply an additional numeric feedback signal in the form of a heuristic, which intrinsically guides the agent. Recent work has shown evidence that in different areas the underlying structure is ordinal and not numerical. Hence erroneous and biased heuristics are inevitable, especially in such domains. In this paper, we propose a MCTS variant which only depends on qualitative feedback, and therefore opens up new applications for MCTS. We also find indications that translating absolute into ordinal feedback may be beneficial. Using a puzzle domain, we show that our preference-based MCTS variant, wich only receives qualitative feedback, is able to reach a performance level comparable to a regular MCTS baseline, which obtains quantitative feedback.Comment: To be publishe

arXiv.org e-Print Archive

Crossref

Behavioral determinants as predictors of return to work after long-term sickness absence: an application of the theory of planned behavior

Background The aim of this prospective, longitudinal cohort study was to analyze the association between the three behavioral determinants of the theory of planned behavior (TPB) model-attitude, subjective norm and self-efficacy-and the time to return-to-work (RTW) in employees on long-term sick leave. Methods The study was based on a sample of 926 employees on sickness absence (maximum duration of 12 weeks). The employees filled out a baseline questionnaire and were subsequently followed until the tenth month after listing sick. The TPB-determinants were measured at baseline. Work attitude was measured with a Dutch language version of the Work Involvement Scale. Subjective norm was measured with a self-structured scale reflecting a person's perception of social support and social pressure. Self-efficacy was measured with the three subscales of a standardised Dutch version of the general self-efficacy scale (ALCOS): willingness to expend effort in completing the behavior, persistence in the face of adversity, and willingness to initiate behavior. Cox proportional hazards regression analyses were used to identify behavioral determinants of the time to RTW. Results Median time to RTW was 160 days. In the univariate analysis, all potential prognostic factors were significantly associated (P < 0.15) with time to RTW: work attitude, social support, and the three subscales of self-efficacy. The final multivariate model with time to RTW as the predicted outcome included work attitude, social support and willingness to expend effort in completing the behavior as significant predictive factors. Conclusions This prospective, longitudinal cohort-study showed that work attitude, social support and willingness to expend effort in completing the behavior are significantly associated with a shorter time to RTW in employees on long-term sickness absence. This provides suggestive evidence for the relevance of behavioral characteristics in the prediction of duration of sickness absence. It may be a promising approach to address the behavioral determinants in the development of interventions focusing on RTW in employees on long-term sick leave

CiteSeerX

Crossref

Proceedings - University of Groningen

University of Groningen

Springer - Publisher Connector

ARTS repository - University of Groningen

University of Groningen Digital Archive

Dissertations of the University of Groningen

Dimension reduction for systems with slow relaxation

Author: A Amir
A Chorin
A Chorin
A Crisanti
AH Jazwinski
AJ Chorin
AJ Chorin
AJ Majda
B Oded
D Comeau
D Giannakis
D Givon
D Givon
D Kondrashov
D Mackay
D Venturi
D Venturi
E Darve
E Ott
E Takens
F Lu
G Walker
GU Yule
H Mori
HM Arnold
J Harlim
J-P Bouchaud
JJ Waterfall
JM Restrepo
JM Restrepo
Juan M. Restrepo
K Kawasaki
K Kawasaki
K Matan
KS Brown
M Budisic
M Fingas
M Fingas
M Kutner
MD Chekroun
MF Fingas
MK Berkenbusch
MK Transtrum
ML Spaulding
OG Sutton
P Flajolet
P Stinis
PK Dixon
R Kubo
R Vautard
R Vautard
R Zwanzig
R Zwanzig
Raman C. Venkataramani
RR Coifman
Shankar C. Venkataramani
T Berry
W Stiver
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/02/2017
Field of study

We develop reduced, stochastic models for high dimensional, dissipative dynamical systems that relax very slowly to equilibrium and can encode long term memory. We present a variety of empirical and first principles approaches for model reduction, and build a mathematical framework for analyzing the reduced models. We introduce the notions of universal and asymptotic filters to characterize `optimal' model reductions for sloppy linear models. We illustrate our methods by applying them to the practically important problem of modeling evaporation in oil spills.Comment: 48 Pages, 13 figures. Paper dedicated to the memory of Leo Kadanof

arXiv.org e-Print Archive

Crossref

The University of Arizona

Assessing the role of undetected colonization and isolation precautions in reducing Methicillin-Resistant Staphylococcus aureus transmission in intensive care units

Author: AJ Sutton
Ben S Cooper
DJ Weber
ER Goodman
FA Manian
G Catalano
GM Snyder
JM Boyce
KB Kirkland
ML Forrester
PD O'Neill
Philip D O'Neill
S Saint
Sheryl L Rifas-Shiman
SS Huang
Susan S Huang
Theodore Kypraios
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Screening and isolation are central components of hospital methicillin-resistant <it>Staphylococcus aureus </it>(MRSA) control policies. Their prevention of patient-to-patient spread depends on minimizing undetected and unisolated MRSA-positive patient days. Estimating these MRSA-positive patient days and the reduction in transmission due to isolation presents a major methodological challenge, but is essential for assessing both the value of existing control policies and the potential benefit of new rapid MRSA detection technologies. Recent methodological developments have made it possible to estimate these quantities using routine surveillance data. Methods Colonization data from admission and weekly nares cultures were collected from eight single-bed adult intensive care units (ICUs) over 17 months. Detected MRSA-positive patients were isolated using single rooms and barrier precautions. Data were analyzed using stochastic transmission models and model fitting was performed within a Bayesian framework using a Markov chain Monte Carlo algorithm, imputing unobserved MRSA carriage events. Results Models estimated the mean percent of colonized-patient-days attributed to undetected carriers as 14.1% (95% CI (11.7, 16.5)) averaged across ICUs. The percent of colonized-patient-days attributed to patients awaiting results averaged 7.8% (6.2, 9.2). Overall, the ratio of estimated transmission rates from unisolated MRSA-positive patients and those under barrier precautions was 1.34 (0.45, 3.97), but varied widely across ICUs. Conclusions Screening consistently detected >80% of colonized-patient-days. Estimates of the effectiveness of barrier precautions showed considerable uncertainty, but in all units except burns/general surgery and one cardiac surgery ICU, the best estimates were consistent with reductions in transmission associated with barrier precautions.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Oxford University Research Archive

Enhancement drugs: are there limits to what we should enhance and why?

Author: CA Soderstrom
CJA Morgan
CJA Morgan
D Goldman
DF Sutton
F Smit
G Kanayama
H Greely
IH Baek
J Rehm
M Di Forti
M Lader
MJ Brownstein
ML Eloi-Stiven
Morten Hesse
NC Stefanis
SA Ross
W Hall
W Hall
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Substances, such as alcohol, opiates and cannabis, have been used by humans for millennia. Today, a much wider range of substances are used for a range of purposes, including the enhancement of performance during university studies, sexual experiences, sports, exercise, at celebrations, socializing and the experience of art and music. Substance use is also associated with a range of harmful effects to the individual and society as a whole. Prohibitions, regulation, prevention and treatment have all been used to protect against this harm. In this commentary, it is argued that public health interventions should target relevant harms and not to evaluate which aspects of human endeavors and experiences should be enhanced and which should not. It is argued that interventions should directly target the harmful effects, using the best available evidence. Two examples are given of substances that may be altered to prevent serious harm - one for alcohol and one for cannabis. In the case of alcohol, the addition of dissolved oxygen could reduce both the risk of accidents and the risk of liver damage associated with alcohol consumption. In the case of cannabis, there is strong indication that the reduction of content Δ-tetrahydrocannabinol and the increase of cannabidiol could reduce the risk of psychoses and the addiction associated with its use. The aim of this article is to show that responsible regulation should not necessarily be restricted to preventing the use and/or (in the case of alcohol) a reduction in the amounts and frequency of its use, but should also aim to include a range of other strategies that could reduce the burden of illness associated with illicit substance use

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Importance sampling in reinforcement learning with an estimated behavior policy

Author: A Dvoretzky
B Delyon
C Gelada
CD Manning
CJ Oates
D Silver
E Greensmith
K Hirano
M Henmi
ML Puterman
PC Austin
PR Rosenbaum
Q Liu
R Bellman
RJ Williams
RS Sutton
RY Rubinstein
S Mahadevan
SP Singh
V Mnih
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2021
Field of study

Crossref

Edinburgh Research Explorer